A Language Model for Parsing Very Long Chinese Sentences

نویسنده

  • Hsin-Hsi Chen
چکیده

B y corpus analyses, about seventy-five percent of Chinese sentences are composed of more than two sentence segments separated by commas or semicolons. A segment may be a sentence, a noun phrase, a verb phrase, an adjective phrase, an adverbial phrase, or a prepositional phrase. An N P segment may serve as a subject of the next segment or an object of the previous segment. The empty category pro may also appear in the VI' segment. The maximal freedom of the uses of pros, the large number of segments, the various segment types, and the associativity problem make sentence parsing difficult. Few parsing systems deal with these problems. This paper regards a segment as a basic parsing unit. And it uses characteristic words, subcategories of verbs, topic chain and some heuristic rules to link the segments into meaningful units. The pro resolution and the segment linking are useful for practical applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

Segmentation of Chinese Long Sentences Using Commas

The comma is the most common form of punctuation. As such, it may have the greatest effect on the syntactic analysis of a sentence. As an isolate language, Chinese sentences have fewer cues for parsing. The clues for segmentation of a long Chinese sentence are even fewer. However, the average frequency of comma usage in Chinese is higher than other languages. The comma plays an important role i...

متن کامل

An Algorithm Combining Statistics-based and Rules-based for Chunk Identification of Chinese Sentences

Natural language processing (NLP) is a very hot research domain. One important branch of it is sentence analysis, including Chinese sentence analysis. However, currently, no mature deep analysis theories and techniques are available. An alternative way is to perform shallow parsing on sentences which is very popular in the domain. The chunk identification is a fundamental task for shallow parsi...

متن کامل

A Hierarchical Parsing Approach with Punctuation Processing for Long Chinese Sentences

(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100080, China) Abstract: Based on the analysis of the usage and the syntactic function of Chinese punctuations, this paper proposes a new hierarchical approach to parsing the long Chinese sentences. In traditional parsing approaches, the parsing procedure is performed on one-level and the ...

متن کامل

A Model for Robust Chinese Parser

The Chinese language has many special characteristics which are substantially different from western languages, causing conventional methods of language processing to fail on Chinese. For example, Chinese sentences are composed of strings of characters without word boundaries that are marked by spaces. Therefore, word segmentation and unknown word identification techniques must be used in order...

متن کامل

Systematic Processing of Long Sentences in Rule Based Portuguese-Chinese Machine Translation

Francisco Oliveira, Fai Wong and Iok-Sai Hong. Systematic Processing of Long Sentences in Rule based Portuguese-Chinese Machine Translation The translation quality and parsing efficiency are often disappointed when Rule based Machine Translation systems deal with long sentences. Due to the complicated syntactic structure of the language, many ambiguous parse trees can be generated during the tr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993